Search CORE

24 research outputs found

Developing and applying heterogeneous phylogenetic models with XRate

Author: A Heger
A Siepel
A Varadarajan
AJ Drummond
B Knudsen
B Knudsen
Christos A. Ouzounis
D Ayres
DB Searls
E Birney
G Lunter
GSC Slater
Ian Holmes
IM Meyer
J Felsenstein
J Goecks
J Watts
JS Pedersen
L Stein
M Garber
M Hasegawa
M Kimura
M Zuker
ME Skinner
N Saitou
O Penn
Oscar Westesson
PS Klosterman
RK Bradley
SR Eddy
TH Jukes
WJ Kent
Z Yang
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 16/02/2012
Field of study

Modeling sequence evolution on phylogenetic trees is a useful technique in computational biology. Especially powerful are models which take account of the heterogeneous nature of sequence evolution according to the "grammar" of the encoded gene features. However, beyond a modest level of model complexity, manual coding of models becomes prohibitively labor-intensive. We demonstrate, via a set of case studies, the new built-in model-prototyping capabilities of XRate (macros and Scheme extensions). These features allow rapid implementation of phylogenetic models which would have previously been far more labor-intensive. XRate's new capabilities for lineage-specific models, ancestral sequence reconstruction, and improved annotation output are also discussed. XRate's flexible model-specification capabilities and computational efficiency make it well-suited to developing and prototyping phylogenetic grammar models. XRate is available as part of the DART software package: http://biowiki.org/DART .Comment: 34 pages, 3 figures, glossary of XRate model terminolog

arXiv.org e-Print Archive

Crossref

PubMed Central

FigShare

Accurate reconstruction of insertion-deletion histories by statistical phylogenetics

Author: A Heger
A Löytynoja
A Löytynoja
A Siepel
A Siepel
A Siepel
AG Clark
AM Moses
Art F. Y. Poon
B Knudsen
B Paten
B Rannala
Benedict Paten
C Lee
C Strope
DG Higgins
EF Moore
FA Matsen
FR Kschischang
G Lunter
Gerton Lunter
I Holmes
I Miklós
Ian Holmes
J Felsenstein
JD Thompson
JL Thorne
JL Thorne
JS Pedersen
K Katoh
K Liu
KM Wong
KS Pollard
L Gomez-Valero
L Zhu
M Larkin
M Mohri
MA Suchard
N de la Chaux
O Kamneva
O Westesson
Oscar Westesson
P Markova-Raina
R Mills
RA Cartwright
RC Edgar
RK Bradley
RK Bradley
S Nelesen
S Saccone
S Sinha
T Beissbarth
X Qu
Z Wang
Z Yang
Z Yang
Z Yang
Z Zhang
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2012
Field of study

The Multiple Sequence Alignment (MSA) is a computational abstraction that represents a partial summary either of indel history, or of structural similarity. Taking the former view (indel history), it is possible to use formal automata theory to generalize the phylogenetic likelihood framework for finite substitution models (Dayhoff's probability matrices and Felsenstein's pruning algorithm) to arbitrary-length sequences. In this paper, we report results of a simulation-based benchmark of several methods for reconstruction of indel history. The methods tested include a relatively new algorithm for statistical marginalization of MSAs that sums over a stochastically-sampled ensemble of the most probable evolutionary histories. For mammalian evolutionary parameters on several different trees, the single most likely history sampled by our algorithm appears less biased than histories reconstructed by other MSA methods. The algorithm can also be used for alignment-free inference, where the MSA is explicitly summed out of the analysis. As an illustration of our method, we discuss reconstruction of the evolutionary histories of human protein-coding genes.Comment: 28 pages, 15 figures. arXiv admin note: text overlap with arXiv:1103.434

arXiv.org e-Print Archive

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Oxford University Research Archive

FigShare

Accurate Detection of Recombinant Breakpoints in Whole-Genome Alignments

Author: A Rambaut
AC Siepel
Aviv Regev
D Filho
D Husmeier
D Husmeier
G McGuire
Ian Holmes
J Archer
J Felsenstein
J Hein
JD Thompson
JP Gomes
K Lau
K Lole
LD Bowler
M Arenas
M Hasegawa
M Thomson
MJ Minichiello
N Friedman
Oscar Westesson
P Awadalla
P Puigbo
R Durbin
R Hudson
RC Edgar
RC Elston
TJ Anderson
VN Minin
YS Song
Publication venue: Public Library of Science
Publication date: 01/03/2009
Field of study

We propose a novel method for detecting sites of molecular recombination in multiple alignments. Our approach is a compromise between previous extremes of computationally prohibitive but mathematically rigorous methods and imprecise heuristic methods. Using a combined algorithm for estimating tree structure and hidden Markov model parameters, our program detects changes in phylogenetic tree topology over a multiple sequence alignment. We evaluate our method on benchmark datasets from previous studies on two recombinant pathogens, Neisseria and HIV-1, as well as simulated data. We show that we are not only able to detect recombinant regions of vastly different sizes but also the location of breakpoints with great accuracy. We show that our method does well inferring recombination breakpoints while at the same time maintaining practicality for larger datasets. In all cases, we confirm the breakpoint predictions of previous studies, and in many cases we offer novel predictions

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Recommended from our members

Statistical phylogenetic methods with applications to virus evolution

Author: Westesson Oscar
Publication venue: eScholarship, University of California
Publication date: 01/01/2012
Field of study

This thesis explores methods for computational comparative modeling of genetic sequences. The framework within which this modeling is undertaken is that of sequence alignments and associated phylogenetic trees. The first part explores methods for building ancestral sequence alignments making explicit use of phylogenetic likelihood functions. New capabilities of an existing MCMC alignment sampler are discussed in detail, and the sampler is used to analyze a set of HIV/SIV gp120 proteins. An approximate maximum-likelihood alignment method is presented, first in a tutorial-style format and later in precise mathematical terms. An implementation of this method is evaluated alongside leading alignment programs. The second part describes methods utilizing multiple sequence alignments. First, mutation rate is used to predict positional mutational sensitivities for a protein. Second, the flexible, automated model-specication capabilities of the XRate software are presented. The final chapter presents recHMM, a method to detect recombination among sequence by use of a phylogenetic hidden Markov model with a tree in each hidden state

eScholarship - University of California

ProQuest OAI Repository

Statistical phylogenetic methods with applications to virus evolution

Author: Westesson Oscar
Publication venue: 'California Digital Library (CDL)'
Publication date: 01/01/2012
Field of study

ProQuest OAI Repository

The model used by PhastCons, a 3-nonterminal HMM with rate multipliers, is compactly expressed by XRate's macro language.

Author: Ian Holmes (5082)
Oscar Westesson (159929)
Publication venue
Publication date
Field of study

Different nonterminal have different evolutionary rates, but they all share the same underlying substitution model. Transition probabilities are shared: a transition between nonterminals happens with probability leaveProb, and self-transitions happen with probability stayProb. This model (with any number of nonterminals) can be expressed in XRate's macro language in approximately 20 lines of code.</p

FigShare

A schematic of a DLESS-style phylo-HMM: each node of the tree has its own nonterminal, such that the node-rooted subtree evolves at a slower rate than the rest of the tree.

Author: Ian Holmes (5082)
Oscar Westesson (159929)
Publication venue
Publication date
Field of study

Inferring the pattern of hidden nonterminals generating an alignment allows for detecting regions of lineage-specific selection. Expressing this model compactly in XRate 's macro language allows it to be used with any input tree without having to write data-specific code or use external model-generating scripts.</p

FigShare

Data from several XRate analyses, shown alongside genes (A) and known RNA structures (B) in poliovirus.

Author: Ian Holmes (5082)
Oscar Westesson (159929)
Publication venue
Publication date
Field of study

XDecoder (C) recovers all known structures with high posterior probability and predicts a promising target for experimental probing (region 6800–7100). XDecoder was run on an alignment of 27 poliovirus sequences with the results visualized as a track in JBrowse <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0036898#pone.0036898-Skinner1" target="_blank">[32]</a> via a wiggle file. Alongside XDecoder probabilities are the three signals which XDecoder aims to disentangle: (D) conservation, (E) coding potential, and (F) RNA structure. Paradoxically, the CRE and RNase-L inhibition elements show both conservation and coding sequence preservation, whereas PFOLD's predictions show only a slight increase in probability density around the known structures. XDecoder is the only grammar which returns predictions of reasonable specificity. The full JBrowse instance is included as Text S 2.</p

FigShare